Identifying Word Translations in Non-Parallel Texts

نویسنده

  • Reinhard Rapp
چکیده

Common algorithms for sentence and word-alignment allow the automatic identification of word translations from parallel texts. This study suggests that the identification of word translations should also be possible with non-parallel and even unrelated texts. The method proposed is based on the assumption that there is a correlation between the patterns of word cooccurrences in texts of different languages.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automatic Identification of Word Translations from Unrelated English and German Corpora

Algorithms for the alignment of words in translated texts are well established. However, only recently new approaches have been proposed to identify word translations from non-parallel or even unrelated texts. This task is more difficult, because most statistical clues useful in the processing of parallel texts cannot be applied to non-parallel texts. Whereas for parallel texts in some studies ...

متن کامل

An Approach to Acquire Word Translations from Non-parallel Texts

Few approaches to extract word translations from non-parallel texts have been proposed so far. Researchers have not been encouraged to work on this topic because extracting information from non-parallel corpora is a difficult task producing poor results. Whereas for parallel texts, word translation extraction can reach about 99%, the accuracy for non-parallel texts has been around 72% up to now...

متن کامل

Empirical Methods for Exploiting Parallel Texts

Parallel translations of written texts have long been useful tools for human students of language, and have begun to serve as an intriguing source of data for corpus-based approaches to natural language processing. A source text and its translation can be viewed as a coarse map between the two languages, and an industrious student or clever computer program may wish to refine that mapping so th...

متن کامل

Empirical Methods for Exploiting Parallel Texts

Parallel translations of written texts have long been useful tools for human students of language, and have begun to serve as an intriguing source of data for corpus-based approaches to natural language processing. A source text and its translation can be viewed as a coarse map between the two languages, and an industrious student or clever computer program may wish to refine that mapping so th...

متن کامل

Identifying Word Translation in Non_Parallel Texts

Common algorithms for sentence and word-alignment allow the automat ic identification of word translations from paxalhl texts. This s tudy suggests tha t the identification of word translations should also be possible with non-paxMlel and even unrelated texts. The method proposed is based on the assumption tha t there is a correlation between the pat terns of word cooccurrences in texts of diff...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1995